Simple task: transformer has to repeat a sequence of random integers (0-9) of varied length, like:
sequence length=7: input[ 1, 3 ,5 ,6, 2, 4, 0] - output[ 1, 3 ,5 ,6, 2, 4, 0]
sequence length=3: input[ 5, 4 ,9 ] - output[ 5, 4 ,9 ]
sequence length=4: input[ 6, 3 ,9, 8 ] - output[ 6, 3 ,9, 8 ]
...
Each integer(0-9) can be stored in embedding layer so we can pass it to transformer.
I trained transformer (generic pytorch model with positional embeddings) on a dataset (1000 examples) of sequences of varied length (1 to 12) and it predicts sequences well within the range of 12 . It fails to predict sequences longer than 12 - 13.
sequence length=20: input[3, 3, 4, 0, 0, 7, 1, 5, 1, 0, 7, 1, 9, 0, 9, 1, 5, 2, 3, 6]
.............................. ...- output[3, 3, 4, 0, 0, 7, 1, 5, 1, 0, 7, 1, 7, 1, 7, 1, 0, 7, 0, 7]
Is it considered an extrapolation task? Are there types of transformers (or other neural networks) that can handle the problem ?
Same issue with recurrent neural networks (RNN, LSTM, GRU).
submitted by /u/InternationalVisito
[link] [comments]
( 89
min )